Point of Interest - Individual report: Do Filming Hotspots Cluster Around Transit, Scenic Architecture, or Redeveloped Areas?

Spatial Mechanisms Linking Film Production and Neighborhood Change in New York City

Author

Yongqiang Zhou

Published

December 18, 2025

Introduction

New York City is one of the most filmed cities in the world, hosting thousands of film and television permits annually across its five boroughs. These filming activities are not randomly distributed in space. Instead, they reflect a complex interaction between urban form, infrastructure accessibility, aesthetic appeal, and recent patterns of redevelopment. Understanding whether filming “hotspots” cluster around specific urban features is important for both scholarly and practical reasons. From an academic perspective, filming permits provide a novel proxy for cultural production and place desirability. From a policy perspective, identifying systematic clustering can inform permitting practices, neighborhood impact mitigation, and long-term urban planning. While these disruptions are short-lived, the repeated selection of certain neighborhoods for filming raises a deeper urban question: what characteristics make some areas persistently attractive to film production, and what does that reveal about neighborhood change?

This report addresses the overarching question (OQ) motivating this project:

Note

Does film production activity act as an early signal — or even a catalyst — for rising real estate values in New York City neighborhoods?

Rather than attempting to answer the full causal question directly, this individual report contributes an irreplaceable spatial mechanism analysis by focusing on the following specific question (SQ):

Note

Do filming hotspots cluster around transit access, scenic or historic architecture, or recently redeveloped and gentrifying areas?

This SQ is essential to the OQ because film production does not select locations randomly. If filming clusters disproportionately in areas with strong transit access, preserved architectural character, or recent economic upgrading, then film permits may act as a revealed-preference signal for neighborhoods already undergoing change — or potentially reinforce those changes through visibility, branding, and temporary economic shocks.

To answer this question, I construct a ZIP-code–level spatial dataset for New York City that integrates:

  • Film permit records (2013–2023)

  • Subway entrance locations

  • Historic district boundaries

  • Census-based income and rent measures

  • Property sales price trends

Using spatial joins, distance calculations, intensity measures, and interactive visualizations, I evaluate whether filming hotspots systematically co-locate with these urban features and discuss alternative explanations and limitations.

Data Sources and Reproducibility

This analysis is designed as a fully reproducible research pipeline, with all data accessed programmatically from public sources.

Film Permit Data

The core dataset consists of NYC film permit records obtained from the Mayor’s Office of Media and Entertainment (MOME). Film permit data come from multiple publicly available NYC sources compiled on GitHub. Each permit corresponds to a permitted filming event, including start and end times and one or more ZIP codes.

Key processing steps include:

  • Restricting to Shooting Permits only

  • Expanding permits that list multiple ZIP codes

  • Limiting the analysis window to 2013–2023

  • Aggregating permits to the ZIP-code level

Code
# FILM PERMITS CLEANING

# This code was done by teammate Paul because it is his task.
library(tidyverse)
library(lubridate)
library(dplyr)
permits <- read_csv("https://raw.githubusercontent.com/NewYorkCityCouncil/film_industry_hearing/refs/heads/master/permits.csv") 

write_csv(permits, file = "permits_data.csv")



Recent_Permits <- read.csv("https://raw.githubusercontent.com/ppiatkow55/STA9750-2025-FALL/refs/heads/main/docs/Film_Permits_20251105.csv")

write_csv(permits, file = "Recent_Permits.csv")


permits_found <- read_csv("https://raw.githubusercontent.com/NewYorkCityCouncil/film_industry_hearing/refs/heads/master/permits_mar23.csv")

write_csv(permits, file = "permits_found.csv") 

Table1 <- Recent_Permits |>
  rename_with(tolower) |>
  mutate(startdatetime = mdy_hms(startdatetime)) |>
  mutate(enddatetime = mdy_hms(enddatetime)) |>
  mutate(enteredon = mdy_hms(enteredon))

Table2 <- permits_found |>
  rename(communityboard.s. = communityboard_s,
         policeprecinct.s. = policeprecinct_s,
         zipcode.s. = zipcode_s)

combined <- full_join(Table1, Table2)

One_Recent_Permits <- Recent_Permits |>
  rename(eventid = EventID, eventtype = EventType) 


combined1 <- combined |>
  select(-parkingheld)

large_permits <- permits |>
  distinct(eventid, .keep_all = TRUE) |>
  select(-main, -cross_st_1, -cross_st_2) |>
  rename(policeprecinct.s.= policeprecinct_s, communityboard.s. = communityboard_s, zipcode.s. = zipcode_s)


clean_permits <- full_join(combined1, large_permits)

clean_permits <- clean_permits |>
  filter(startdatetime >= as.Date("2013-01-01") & startdatetime <= as.Date("2023-12-31"), 
         eventtype == ("Shooting Permit")) |>
  separate_rows(zipcode.s., sep = ",\\s*") |>
  mutate(zipcode = trimws(zipcode.s.),
         start_year = year(startdatetime),
         start_month = month(startdatetime),
         start_day = day(startdatetime),
         end_year = year(enddatetime),
         end_month = month(enddatetime),
         end_day = day(enddatetime)) |>
  select(-communityboard.s., -policeprecinct.s., -country, -zipcode.s., - enteredon, -startdatetime, -enddatetime) |>
  relocate(eventid, eventtype, start_year, start_month, start_day, end_year, end_month, end_day)

clean_permits <- clean_permits |>
  group_by(zipcode, start_year) |>
  mutate(zip_count_by_year = n()) |>
  group_by(zipcode, start_year, start_month) |>
  mutate(zip_count_by_month = n()) |>
  group_by(zipcode) |>
  mutate(zip_count_total = n()) |>
  ungroup()

clean_permits <- clean_permits |> filter(zipcode != "0")

Property Sales

Property sales data are obtained from New York State Department of Finance. These data are annualized sales used to show sales of properties in NYC five boroughs. We can use this to see if the filming permits density can be early signal — or even a catalyst — for rising real estate values in New York City neighborhoods.

Code
# This code was completed by team member Yu Yang
# ---- Setup download directory ----
library(readxl)
library(dplyr)
library(janitor)
library(stringr)

data_dir <- "data/property_sales_raw"
dir.create(data_dir, recursive = TRUE, showWarnings = FALSE)

base_url <- "https://www.nyc.gov/assets/finance/downloads/pdf/rolling_sales/annualized-sales"

years    <- 2013:2023
boroughs <- c("bronx", "brooklyn", "manhattan", "queens", "statenisland")

# Helper to try one URL and return success TRUE/FALSE
safe_download <- function(url, dest) {
  status <- tryCatch(
    download.file(url, destfile = dest, mode = "wb", quiet = TRUE),
    error = function(e) 1L
  )
  is.numeric(status) && status == 0L
}

# ---- Function to download one file ----
download_one_file <- function(year, borough) {
  # pick correct borough slug for the filename
  borough_slug <- borough
  if (borough == "statenisland" && year >= 2020) {
    borough_slug <- "staten_island"   # 2020+ files use underscore
  }

  # older years = .xls, newer years = .xlsx
  if (year <= 2017) {
    exts <- c("xls", "xlsx")
  } else {
    exts <- c("xlsx", "xls")
  }

  for (ext in exts) {
    url  <- sprintf("%s/%d/%d_%s.%s", base_url, year, year, borough_slug, ext)
    dest <- file.path(data_dir, sprintf("%d_%s.%s", year, borough_slug, ext))

    message("Trying: ", url)

    if (safe_download(url, dest)) {
      message("  ✓ downloaded: ", dest)
      return(dest)
    } else {
      message("  ✗ failed for: ", url)
    }
  }

  warning("Could not download file for ", year, " / ", borough)
  NA_character_
}


# ---- Download ALL 2013–2023 files ----
files <- unlist(
  lapply(years, function(y) {
    sapply(boroughs, download_one_file, year = y, simplify = TRUE)
  })
)

# keep only successful downloads
files <- files[!is.na(files)]

# Quick check: which years did we actually download?
download_years <- table(substr(basename(files), 1, 4))
#print(download_years)

# ---- Read and combine ----
# ---- Read and combine (using BOROUGH row as header) ----
sales_list <- lapply(files, function(f) {

  # read everything as text, no column names yet
  raw <- read_excel(
    path      = f,
    col_names = FALSE,
    col_types = "text"
  )

  # find the row that contains the real header ("BOROUGH ...")
  header_row <- which(raw[[1]] == "BOROUGH")
  if (length(header_row) == 0L) {
    header_row <- which(grepl("BOROUGH", raw[[1]]))[1]
  }
  if (is.na(header_row)) {
    stop("Could not find header row in file: ", f)
  }

  # use that row as column names, data starts on the next row
  header <- as.character(unlist(raw[header_row, ]))
  df <- raw[(header_row + 1):nrow(raw), ]
  names(df) <- header

  # drop completely empty rows/cols
  df <- df %>%
    remove_empty("rows") %>%
    remove_empty("cols")

  # add year from file name (e.g. "2019_bronx.xlsx")
  name <- basename(f)
  year <- as.numeric(substr(name, 1, 4))
  df$year <- year

  df
})

sales_2013_2023 <- bind_rows(sales_list)


# ---- Split into 3 format groups ----
sales_raw_2013_2017 <- sales_2013_2023 %>%
  filter(year >= 2013, year <= 2017)

sales_raw_2018_2019 <- sales_2013_2023 %>%
  filter(year >= 2018, year <= 2019)

sales_raw_2020_2023 <- sales_2013_2023 %>%
  filter(year >= 2020, year <= 2023)

# ---- Quick checks ----
#message("Check years in each split:")
#print(table(sales_raw_2013_2017$year))
#print(table(sales_raw_2018_2019$year))
#print(table(sales_raw_2020_2023$year))

#message("Row counts:")
#nrow(sales_raw_2013_2017)
#nrow(sales_raw_2018_2019)
#nrow(sales_raw_2020_2023)

#message("Column counts (should be similar within each group):")
#ncol(sales_raw_2013_2017)
#ncol(sales_raw_2018_2019)
#ncol(sales_raw_2020_2023)#

clean_2013_2017 <- function(df) {
  df %>%
    clean_names() %>%
    filter(borough %in% c("1","2","3","4","5")) %>%
    mutate(
      borough = case_when(
        borough == "1" ~ "Manhattan",
        borough == "2" ~ "Bronx",
        borough == "3" ~ "Brooklyn",
        borough == "4" ~ "Queens",
        borough == "5" ~ "Staten Island"
      ),
      sale_price = suppressWarnings(as.numeric(gsub("[^0-9]", "", sale_price))),
      sale_date  = suppressWarnings(as.Date(as.numeric(sale_date), origin = "1899-12-30"))
    ) %>%
    filter(!is.na(sale_price), sale_price > 10000) %>%
    select(year, borough, neighborhood, address, zip_code, sale_price, sale_date) %>%
    arrange(year, borough, sale_date)
}

sales_clean_2013_2017 <- clean_2013_2017(sales_raw_2013_2017)

# ---- Quick checks ----
#message("2013-2017 cleaned:")
#summary(sales_clean_2013_2017$sale_price)
#range(sales_clean_2013_2017$sale_date, na.rm = TRUE)
#head(sales_clean_2013_2017)


clean_2018_2019 <- function(df) {

  df %>%
    clean_names() %>%
    
    # Merge duplicate columns
    mutate(
      borough_code_raw = coalesce(borough, borough_2, borough_3),
      neighborhood_raw = coalesce(neighborhood, neighborhood_2, neighborhood_3),
      address_raw      = coalesce(address, address_2, address_3),
      zip_code_raw     = coalesce(zip_code, zip_code_2, zip_code_3),
      sale_price_raw   = coalesce(sale_price, sale_price_2, sale_price_3),
      sale_date_raw    = coalesce(sale_date, sale_date_2, sale_date_3)
    ) %>%
    
    # Borough code -> name
    mutate(
      borough_code_raw = trimws(borough_code_raw),
      borough = case_when(
        borough_code_raw == "1" ~ "Manhattan",
        borough_code_raw == "2" ~ "Bronx",
        borough_code_raw == "3" ~ "Brooklyn",
        borough_code_raw == "4" ~ "Queens",
        borough_code_raw == "5" ~ "Staten Island",
        TRUE ~ NA_character_
      )
    ) %>%
    filter(!is.na(borough)) %>%
    
    # Clean price
    mutate(
      sale_price = suppressWarnings(as.numeric(gsub("[^0-9]", "", sale_price_raw)))
    ) %>%
    
    # ---- Robust date parsing ----
    mutate(
      # Try numeric Excel serial first
      sale_date_num = suppressWarnings(as.numeric(sale_date_raw)),
      date_from_num = suppressWarnings(as.Date(sale_date_num, origin = "1899-12-30")),

      # Try character dates (mdy, ymd)
      date_from_char = suppressWarnings(parse_date_time(sale_date_raw, orders = c("mdy", "ymd"))),

      # Final date = numeric first, otherwise character-parsed
      sale_date = coalesce(date_from_num, date_from_char)
    ) %>%
    
    filter(!is.na(sale_price), sale_price > 10000, !is.na(sale_date)) %>%
    
    transmute(
      year,
      borough,
      neighborhood = neighborhood_raw,
      address      = address_raw,
      zip_code     = zip_code_raw,
      sale_price,
      sale_date
    ) %>%
    arrange(year, borough, sale_date)
}

clean_2020_2023 <- function(df) {

  df %>%
    clean_names() %>%

    # 1) Merge the duplicate columns (_2, _3) into one main set
    mutate(
      borough_code_raw = coalesce(borough, borough_2, borough_3),
      neighborhood_raw = coalesce(neighborhood, neighborhood_2, neighborhood_3),
      address_raw      = coalesce(address, address_2, address_3),
      zip_code_raw     = coalesce(zip_code, zip_code_2, zip_code_3),
      sale_price_raw   = coalesce(sale_price, sale_price_2, sale_price_3),
      sale_date_raw    = coalesce(sale_date, sale_date_2, sale_date_3)
    ) %>%

    # 2) Borough code -> borough name
    mutate(
      borough_code_raw = trimws(borough_code_raw),
      borough = case_when(
        borough_code_raw == "1" ~ "Manhattan",
        borough_code_raw == "2" ~ "Bronx",
        borough_code_raw == "3" ~ "Brooklyn",
        borough_code_raw == "4" ~ "Queens",
        borough_code_raw == "5" ~ "Staten Island",
        TRUE ~ NA_character_
      )
    ) %>%
    filter(!is.na(borough)) %>%

    # 3) Clean sale price
    mutate(
      sale_price = suppressWarnings(
        as.numeric(gsub("[^0-9]", "", sale_price_raw))
      )
    ) %>%

    # 4) Robust date parsing (Excel numbers + various text formats)
    mutate(
      sale_date_num  = suppressWarnings(as.numeric(sale_date_raw)),
      date_from_num  = suppressWarnings(as.Date(sale_date_num, origin = "1899-12-30")),
      date_from_char = suppressWarnings(
        parse_date_time(sale_date_raw, orders = c("ymd", "mdy", "dmy"))
      ),
      sale_date      = coalesce(date_from_num, date_from_char)
    ) %>%

    # 5) Keep only reasonable, non-missing sales
    filter(
      !is.na(sale_price),
      sale_price > 10000,
      !is.na(sale_date)
    ) %>%

    # 6) Final columns in same format as other years
    transmute(
      year,
      borough,
      neighborhood = neighborhood_raw,
      address      = address_raw,
      zip_code     = zip_code_raw,
      sale_price,
      sale_date    = as.Date(sale_date)
    ) %>%
    arrange(year, borough, sale_date)
}

# Create cleaned tables for each period
sales_clean_2013_2017 <- clean_2013_2017(sales_raw_2013_2017)
sales_clean_2018_2019 <- clean_2018_2019(sales_raw_2018_2019)
sales_clean_2020_2023 <- clean_2020_2023(sales_raw_2020_2023)


# Make sure sale_date is Date in every piece
sales_clean_2013_2017 <- sales_clean_2013_2017 %>%
  mutate(sale_date = as.Date(sale_date))

sales_clean_2018_2019 <- sales_clean_2018_2019 %>%
  mutate(sale_date = as.Date(sale_date))

sales_clean_2020_2023 <- sales_clean_2020_2023 %>%
  mutate(sale_date = as.Date(sale_date))

# ---- Combine all years 2013–2023 ----
sales_clean_2013_2023 <- bind_rows(
  sales_clean_2013_2017,
  sales_clean_2018_2019,
  sales_clean_2020_2023
) %>%
  arrange(year, borough, sale_date)

sales_clean_2013_2023 <- sales_clean_2013_2023 |> filter(zip_code != "0") 

# ---- Final quick checks ----
#message("Combined 2013–2023:")
#print(table(sales_clean_2013_2023$year))

#range(sales_clean_2013_2023$sale_date, na.rm = TRUE)
#summary(sales_clean_2013_2023$sale_price)

Transit Infrastructure

Subway entrance locations are obtained from New York State open data. These point locations are used to show the density of subway stations from each ZIP code, also we can use this to see if the filming hotspots with more subway stations.

Code
library(sf)
# Subway entrances (MTA open data - direct link works Dec 2025)
subway <- read_csv("https://data.ny.gov/api/views/i9wp-a4ja/rows.csv?accessType=DOWNLOAD") |>
  st_as_sf(coords = c("Entrance Longitude", "Entrance Latitude"), crs = 4326) |>
  st_transform(26918) |>
  filter(`Constituent Station Name` != "")  # remove duplicates

Scenic and Historic Architecture

Historic district boundaries are sourced from NYC Open Data. These districts capture neighborhoods with preserved architectural character, often associated with cultural capital and tourism appeal. ZIP codes are classified as “near historic” if they intersect or lie within 200 meters of a historic district boundary.

Code
# NYC Historic Districts (official shapefile from NYC Open Data)

hist_dist <- tryCatch({
  st_read("https://data.cityofnewyork.us/api/geospatial/xbvj-gfnw?method=export&format=GeoJSON",
          quiet = TRUE) |>
    st_transform(26918)
}, error = function(e) {
  warning("Historic districts download failed: ", e$message, ". Proceeding without historic layer (near_historic = FALSE). Download manually from https://data.cityofnewyork.us/Housing-Development/Historic-Districts-Map-/xbvj-gfnw")
  NULL
})

Economic Context: Income and Rent (2013-2023)

To contextualize filming hotspots within neighborhood change, I also downloaded two datasets for analysis:

  • NYC Household income from the ACS (2013 - 2023)

  • NYC Rent from ACS (2013 - 2023)

Code
# ============================================================
# SAFE LOAD FUNCTION (server-friendly)
# ============================================================
if(!dir.exists(file.path("data", "finalproject"))){
  dir.create(file.path("data", "finalproject"), showWarnings=FALSE, recursive=TRUE)
}

library <- function(pkg){
  ## Mask base::library() to automatically install packages if needed
  ## Masking is important here so downlit picks up packages and links
  ## to documentation
  pkg <- as.character(substitute(pkg))
  options(repos = c(CRAN = "https://cloud.r-project.org"))
  if(!require(pkg, character.only=TRUE, quietly=TRUE)) install.packages(pkg)
  stopifnot(require(pkg, character.only=TRUE, quietly=TRUE))
}

library(tidyverse)
library(glue)
library(readxl)
library(tidycensus)

get_acs_all_years <- function(variable, geography="zcta",
                              start_year=2013, end_year=2023){
  fname <- glue("{variable}_{geography}_{start_year}_{end_year}.csv")
  fname <- file.path("data", "finalproject", fname)
  
  if(!file.exists(fname)){
    YEARS <- seq(start_year, end_year)
    YEARS <- YEARS[YEARS != 2020] # Drop 2020 - No survey (covid)
    
    ALL_DATA <- map(YEARS, function(yy){
      tidycensus::get_acs(geography, variable, year=yy, survey="acs5") |>
        mutate(year=yy) |>
        select(-moe, -variable) |>
        rename(!!variable := estimate)
    }) |> bind_rows()
    
    write_csv(ALL_DATA, fname)
  }
  
  read_csv(fname, show_col_types=FALSE)
}
#Filter for NYC ZIPs
NYC_ZCTAS <- c(
  # Bronx (005)
  "10451", "10452", "10453", "10454", "10455", "10456", "10457", "10458", 
  "10459", "10460", "10461", "10462", "10463", "10464", "10465", "10466", 
  "10467", "10468", "10469", "10470", "10471", "10472", "10473", "10474", 
  "10475",
  # Brooklyn (047) - Kings County
  "11201", "11203", "11204", "11205", "11206", "11207", "11208", "11209", 
  "11210", "11211", "11212", "11213", "11214", "11215", "11216", "11217", 
  "11218", "11219", "11220", "11221", "11222", "11223", "11224", "11225", 
  "11226", "11228", "11229", "11230", "11231", "11232", "11233", "11234", 
  "11235", "11236", "11237", "11238", "11239",
  # Manhattan (061) - New York County
  "10001", "10002", "10003", "10004", "10005", "10006", "10007", "10009", 
  "10010", "10011", "10012", "10013", "10014", "10016", "10017", "10018", 
  "10019", "10020", "10021", "10022", "10023", "10024", "10025", "10026", 
  "10027", "10028", "10029", "10030", "10031", "10032", "10033", "10034", 
  "10035", "10036", "10037", "10038", "10039", "10040", "10044",
  # Queens (081)
  "11101", "11102", "11103", "11104", "11105", "11106", "11354", "11355", 
  "11356", "11357", "11358", "11360", "11361", "11362", "11363", "11364", 
  "11365", "11366", "11367", "11368", "11369", "11370", "11371", "11372", 
  "11373", "11374", "11375", "11377", "11378", "11379", "11385", "11411", 
  "11412", "11413", "11414", "11415", "11416", "11417", "11418", "11419", 
  "11420", "11421", "11422", "11423", "11426", "11427", "11428", "11429", 
  "11432", "11433", "11434", "11435", "11436", "11691", "11692", "11693", 
  "11694", "11695", "11697",
  # Staten Island (085) - Richmond County
  "10301", "10302", "10303", "10304", "10305", "10306", "10307", "10308", 
  "10309", "10310", "10311", "10312", "10314"
)
# Household income (12 month)
INCOME <- get_acs_all_years("B19013_001") |>
  rename(household_income = B19013_001)
INCOME <- INCOME |>
  filter(GEOID %in% NYC_ZCTAS)

# Monthly rent
RENT <- get_acs_all_years("B25064_001") |>
  rename(monthly_rent = B25064_001)
RENT  <- RENT  |>
  filter(GEOID %in% NYC_ZCTAS)

# Combine tables
JOIN_TABLE <- INCOME |>
  left_join(RENT, by = c("GEOID", "year","NAME" )) |>
  rename(zipcode = "GEOID") |>
  select(-NAME)

Analytical Strategy

The central challenge of this analysis is to isolate spatial clustering mechanisms rather than merely documenting where filming occurs.

Three complementary strategies are used:

  1. Total permits per ZIP (2013–2023)

  2. Density of subway stations per ZIP

  3. Compare top filming ZIPs with a curated list of known “hotspot” neighborhoods

This layered approach helps distinguish whether filming clusters are driven by logistical convenience, aesthetic value, or recent redevelopment, rather than random or purely administrative factors.

Spatial Feature Construction

Initial mapping of film permits reveals visually striking clustering in Manhattan south of 59th Street, as well as secondary clusters in Brooklyn neighborhoods such as DUMBO and Williamsburg and Long Island City. Outer-borough filming exists but is more diffuse. (See figure in results)

Code
library(tidyverse)
library(sf)
library(tidycensus)
library(tigris)
library(scales)
library(ggspatial)
library(patchwork)
library(ggplot2)
# Get NYC ZCTA geometry
nyc_zctas_sf <- zctas(cb = TRUE, year = 2020, progress_bar = FALSE) |>
  filter(GEOID20 %in% NYC_ZCTAS) |>
  select(zipcode = GEOID20) |>
  st_transform(26918)  # NAD83 / New York Long Island (meters)

film_by_zip <- clean_permits |>
  filter(start_year >= 2013 & start_year <= 2023) |>
  group_by(zipcode) |>
  summarise(
    n_permits = n(),
    n_years_active = n_distinct(start_year),
    .groups = "drop"
  ) |>
  mutate(zipcode = as.character(zipcode))

# Join economic variables
zcta_data <- nyc_zctas_sf |>
  left_join(JOIN_TABLE |> mutate(year = as.character(year)), by = "zipcode") |>
  left_join(film_by_zip, by = "zipcode") |>
  replace_na(list(n_permits = 0))

# Compute gentrification proxy: % increase in median household income 2013–2023
gentrification <- JOIN_TABLE |>
  filter(year %in% c(2013, 2023)) |>
  select(zipcode, year, household_income) |>
  pivot_wider(names_from = year, values_from = household_income, names_prefix = "inc_") |>
  mutate(pct_change_income = (inc_2023 - inc_2013) / inc_2013) |>
  select(zipcode, pct_change_income)

zcta_final <- zcta_data |>
  left_join(gentrification, by = "zipcode") 

Results

Filming and Transit Access

Mapping film permit intensity alongside subway entrances reveals a striking pattern: high-permit ZIP codes are almost universally well-served by subway infrastructure.

ZIP codes in the top level of filming intensity are, on average, substantially closer to subway entrances than the citywide median. Peripheral ZIP codes with limited transit access exhibit dramatically fewer permits, even when controlling for land area.

This supports the hypothesis that logistical efficiency matters: production crews, equipment trucks, and talent movement all benefit from dense transit networks. However, transit access alone is insufficient. Many transit-rich ZIPs (e.g., purely residential outer-borough areas) still exhibit modest filming activity, suggesting additional filters are at work.

Code
# Visualizations
library(plotly)
p1 <- ggplot(zcta_final) +
  geom_sf(aes(fill = n_permits,
              text = paste("ZIP:", zipcode,  # replace 'zipcode' with your actual column name
                           "<br>Permits:", n_permits)),
          color = "white", size = 0.2) +
  scale_fill_viridis_c(name = "Total Film Permits\n2013–2023", 
                       option = "magma", trans = "sqrt") +
  geom_sf(data = subway, size = 0.4, color = "#00aaff", alpha = 0.6) +
  labs(title = "Film Permits + Subway Stations") +
  theme_void() + 
  theme(legend.position = "bottom")

# Make it interactive
ggplotly(p1, tooltip = "text")

Key patterns observed:

1. High-filming ZIP codes are tightly aligned with dense subway networks.

Manhattan south of 59th Street appears as a continuous high-intensity corridor, with especially strong clustering in Midtown, Lower Manhattan, and adjacent Brooklyn neighborhoods. These areas also show the highest concentration of subway entrances.

2. Peripheral ZIP codes exhibit sharply lower filming intensity. Large outer-borough ZIP codes with sparse subway coverage (e.g., eastern Queens, southern Staten Island) appear light-colored, indicating very few permits despite substantial land area.

3. Transit access is necessary but not sufficient. While nearly all high-filming ZIPs are transit-rich, the reverse is not true: several residential, transit-accessible ZIPs still show modest filming activity.

This figure strongly supports the hypothesis that logistical accessibility is a foundational requirement for filming hotspots. Film production is labor- and equipment-intensive, making proximity to subway infrastructure a major cost-saving factor. However, because many transit-rich ZIPs do not become filming hotspots, transit access alone cannot explain clustering. This motivates examination of additional filters such as aesthetics and redevelopment.

Code
# Prepare a ZCTA geometry object
hotspot_zips <- c(
  # Scenic/Historic: DUMBO/Brooklyn Heights, Greenwich Village/West Village/SoHo, Hudson Yards, West Village, East Village
  "11201", "10011","10012", "10013", "10014", "10001", "10003",
  # Redeveloped/Gentrifying: Williamsburg, Hell's Kitchen, Meatpacking, Greenpoint, Long Island City
  "11211", "10036", "10018", "11222", "11101"
)

top10 <- clean_permits %>%
  filter(!is.na(zipcode), zipcode != "") %>%
  count(zipcode, name = "permit_count") %>%
  arrange(desc(permit_count)) %>%
  slice_head(n = 10) %>%
  mutate(is_hotspot = zipcode %in% hotspot_zips)  
# Join film data to geometry
zcta_film <- nyc_zctas_sf |>
  left_join(film_by_zip, by = "zipcode") |>
  mutate(
    n_permits = coalesce(n_permits, 0),
    area_km2 = as.numeric(st_area(geometry)) / 1e6,
    permits_per_km2 = n_permits / area_km2,
    is_hotspot = zipcode %in% hotspot_zips
  )
zcta_sf <- NULL
if (exists("zcta_film")) zcta_sf <- zcta_film
if (exists("zcta_sf"))   zcta_sf <- zcta_sf

# If none exists, try to load from tigris (common option for ZIP geometry)
if (is.null(zcta_sf)) {
  safe_load(tigris)
  # cb = TRUE gives generalized geometry; good for quick maps
  zcta_sf <- tigris::zctas(cb = TRUE, year = 2023) %>% 
    st_as_sf()
}

# If still don't have geometry, skip map (but keep table + histogram)
if (!is.null(zcta_sf)) {
  
  # Make sure to have a consistent ZIP key (GEOID is typical for ZCTAs)
  zcta_sf <- zcta_sf %>%
    mutate(zipcode = if ("GEOID" %in% names(.)) GEOID else zipcode)
  
  # Join top10 counts onto geometry (leave others NA so background is lighter)
  map_df <- zcta_sf %>%
    left_join(top10, by = "zipcode") %>%
    mutate(
      is_top10   = !is.na(permit_count),
      is_hotspot = zipcode %in% hotspot_zips
    )
  #restrict to NYC-like ZIPs to avoid continental clutter
  nyc_like <- unique(c(top10$zipcode, hotspot_zips))
  map_df <- map_df %>% filter(zipcode %in% nyc_like)
 
  # ----------------------------
  # Map plot (top-10 colored; hotspot outlines)
  # ----------------------------
  p_map <- ggplot(map_df) +
    # base: show all included NYC zips in light gray (missing counts -> light)
    geom_sf(aes(fill = permit_count), color = "gray90", size = 0.15, alpha = 0.9) +
    scale_fill_viridis_c(option = "C", na.value = "gray95", direction = 1) +
    # highlight hotspot zips with a clear border (on top)
    geom_sf(
      data = subset(map_df, is_hotspot),
      fill = NA, color = "red", size = 0.7
    ) +
    # label top-10 zipcodes (helps read the map at a glance)
    geom_sf_text(
      data = subset(map_df, is_top10),
      aes(label = zipcode),
      size = 2.6, color = "black", fontface = "bold"
    ) +
    labs(
      title = "Top 10 Shooting-Permit ZIP Codes (2013–2023)",
      subtitle = "Color = permit count (top-10), red outline = user-defined hotspot ZIPs",
      fill = "Permits"
    ) +
    theme_minimal()
  
  print(p_map)
}

Key patterns observed:

1. Substantial overlap between empirical top-10 ZIPs and known hotspots.

A majority of the top-performing ZIP codes fall within neighborhoods commonly recognized as visually distinctive or recently redeveloped (e.g., SoHo, West Village, Williamsburg, DUMBO).

2. Clustering is geographically compact rather than dispersed. Top ZIPs are spatially adjacent rather than scattered, forming recognizable filming corridors across Lower Manhattan and inner Brooklyn.

3. Outliers are limited and interpretable.. ZIPs that appear in the top 10 but are not in the curated hotspot list still tend to border or connect directly to hotspot areas, suggesting spatial spillovers.

This figure demonstrates that filming hotspots are not statistical artifacts, but correspond closely to well-known urban typologies. The strong overlap reinforces the idea that filming concentrates in neighborhoods combining accessibility, symbolic capital, and urban amenities. The spatial adjacency of top ZIPs further suggests that filming benefits from contiguous, production-friendly environments rather than isolated locations.

Scenic and Historic Architecture

ZIP codes intersecting or adjacent to historic districts show disproportionately high filming intensity, even after adjusting for size. Historic areas such as:

  • Lower Manhattan / Midtown Manhattan

  • Greenwich Village / West Village / East Village

  • Brooklyn Heights / DUMBO / Long Island City

This finding aligns with qualitative industry knowledge: historic streetscapes offer visual richness, period flexibility, and cinematic authenticity that newer developments often lack. Importantly, this mechanism is distinct from transit access. Some historic districts are not the most transit-rich locations, yet still attract substantial filming, indicating an aesthetic premium.

Code
library(scales)  # for percent()

p2 <- ggplot(zcta_final) +
  geom_sf(aes(fill = pct_change_income,
              text = paste("ZIP:", zipcode,  # replace 'zipcode' with actual ZIP column name
                           "<br>Δ Income:", percent(pct_change_income, accuracy = 0.1))),
          color = "gray80", size = 0.1) +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red",
                       midpoint = 0, labels = percent, 
                       name = "Δ Median HH Income\n2013–2023") +
  geom_sf(data = hist_dist, fill = NA, color = "forestgreen", lwd = 1.2) +
  labs(title = "Gentrification + Historic Districts") +
  theme_void()

# Convert to interactive
ggplotly(p2, tooltip = "text")

This map visualizes percentage change in median household income (2013–2023) by ZIP code, overlaid with officially designated historic district boundaries.

Key patterns observed

1. Historic districts frequently coincide with above-median income growth. Many historic areas—particularly in Manhattan and brownstone Brooklyn—appear in lighter or warmer colors, indicating income growth at or above the citywide median.

2. Filming hotspots tend to lie at the intersection of history and growth. ZIP codes that are both near historic districts and experiencing income growth are disproportionately represented among high-filming areas.

Historic and scenic architecture contributes meaningfully to filming concentration, but it is not the sole driver. Formal landmark designation captures only part of the aesthetic logic guiding location selection.

Filming and Redevelopment / Gentrification

When filming intensity is overlaid with income growth and property value appreciation, a nuanced pattern emerges:

  • Many filming hotspots experienced above-median income and price growth between 2013 and 2023.

  • Examples include Williamsburg, Long Island City, and Hudson Yards–adjacent ZIP codes.

  • However, not all gentrifying ZIPs attract heavy filming, nor are all filming ZIPs rapidly gentrifying.

This suggests that filming activity is selective within gentrifying areas, favoring neighborhoods that combine: Accessibility, Visual appeal and a degree of safety and institutional stability.

Hotspot Analysis

To ground the analysis, I identify the top 10 ZIP codes by filming permits and compare them against a curated list of known scenic, historic, or redeveloped neighborhoods.

Scenic/Historic: DUMBO/Brooklyn Heights(11201), Greenwich Village/West Village/SoHo(10011-10013), Hudson Yards(10001), West Village(10014), East Village(10003)

Redeveloped/Gentrifying: Williamsburg(11211), Hell’s Kitchen(10036), Meatpacking(10018), Greenpoint(11222), Long Island City(11101)

Code
library(DT)

top10 <- clean_permits %>%
  filter(!is.na(zipcode), zipcode != "") %>%
  count(zipcode, name = "permit_count") %>%
  arrange(desc(permit_count)) %>%
  slice_head(n = 10) %>%
  mutate(is_hotspot = zipcode %in% hotspot_zips)

# Format table and make it pretty
format_titles <- function(df){
    colnames(df) <- str_replace_all(colnames(df), "_", " ") |> str_to_title()
    df
}
top10 |> format_titles() |>
      datatable(caption = "Top 10 Permits ZIPs - Count Distribution",
          options = list(searching=FALSE, info=FALSE))
Code
p_hist <- top10 %>%
  ggplot(aes(x = reorder(zipcode, permit_count), y = permit_count, fill = is_hotspot)) +
  geom_col(width = 0.8) +
  scale_fill_manual(values = c("FALSE" = "steelblue", "TRUE" = "tomato")) +
  labs(
    title = "Top 10 Permit ZIPs — Count Distribution",
    x = "ZIP code (sorted by permits)",
    y = "Permit count",
    fill = "In hotspot list?"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

print(p_hist)

A majority of the top filming ZIPs fall within the predefined hotspot list, reinforcing the idea that filming clusters are not incidental, but aligned with recognizable urban typologies. Most of the highest-permit ZIPs are colored as “TRUE” for hotspot status, indicating alignment with scenic, historic, or redeveloped neighborhoods.

Permit counts are highly skewed. Even among the top 10, filming intensity varies substantially, reinforcing the idea of extreme concentration.

Non-hotspot ZIPs still show elevated activity. A small number of top ZIPs fall outside the curated list, suggesting either emerging hotspots or logistical spillovers from adjacent areas.

Discussion and Objections

The analysis provides a clear and nuanced answer to the SQ: Yes, filming hotspots cluster around transit infrastructure, scenic architecture, and recently redeveloped areas, but these factors operate jointly rather than independently.

Objection 1: “Filming just follows permits, not neighborhoods.” While permitting constraints exist, they do not explain persistent clustering across a decade. Many ZIPs are legally filmable but rarely selected, suggesting active preference rather than passive availability.

Objection 2: “Transit access explains everything.” Transit access is necessary but not sufficient. Several transit-rich ZIPs lack filming activity, while some historic areas with modest transit still attract heavy filming. This indicates multi-dimensional selection.

Objection 3: “Filming does not cause gentrification.” This analysis does not claim direct causality. Instead, it positions filming as a signal and potential amplifier. Film production appears to recognize neighborhoods already valuable along cultural, logistical, or economic dimensions — and may reinforce their visibility.

Implications for the Overarching Question

By demonstrating that filming hotspots cluster around transit, scenic architecture, and redevelopment, this report strengthens the interpretation of film permits as a leading indicator of neighborhood desirability.

If film production systematically selects neighborhoods with rising incomes and prices, then permit data may serve as an early warning system for real estate change — particularly when combined with temporal analysis in future work.

Conclusion

This individual analysis provides a crucial spatial mechanism linking film production to neighborhood change. Filming hotspots in New York City are not randomly distributed: they cluster in places that are accessible, visually distinctive, and often economically ascending.

While filming alone does not cause gentrification, it appears deeply embedded in the same urban dynamics that drive rising real estate values. As such, film permit data deserve serious consideration as both an analytical tool and a policy-relevant signal in urban economic research.